Version Control with RStudio and GitHub

Author

Joshua P. French

Published

December 14, 2024

To open this information in an interactive Colab notebook, click or scan the QR code below.


Data scientists should have an intentional process for tracking changes in their code over time.

In ancient times, like the early 2000s, we simply renamed our files manually:

In modern times, version control is automatic in my contexts.

The industry standard for version control of computer code and software is git. We will learn how to use git from within R Studio while using GitHub to host our code and version control data.

What is git?

Software developers need a fast, reliable way of tracking changes to their source code when working as individuals or as part of a team.

The best-in-class approach to software version control is git, which was created by Linus Torvalds in 2005 to improve on the version control applications available at the time.

From https://git-scm.com/:

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

By default, git tracks files changed in a directory or repository locally, i.e., on our actual computer, which can create some potential challenges:

  • git does not backup our code or changes, which would be problematic if our computer becomes compromised..
  • A centralized repository must be available when multiple contributors are working on the same project. There needs to be a way of easily managing each person’s contributions and keeping track of the development and stable branches of the code.

GitHub is the most popular solution for these two issues.

What is GitHub?

GitHub (https://www.github.com) is an online platform for hosting git repositories.

GitHub makes it easy to:

  • Control who is able to access and change a repository.
  • Track bugs in our code.
  • Keep record of feature requests.
  • Delegate responsibilities.
  • Maintain a wiki.
  • etc.

GitHub is a great place to:

  • Backup and maintain version control of our software.
  • Make our software public and allow others to contribute and interact with our work.
  • Demonstrate our coding skills and abilities by creating publicly available project repositories.
  • Host a professional website for potential employers or clients.

Linking RStudio and GitHub

We can use RStudio to manage a local git repository and push the changes to GitHub.com.

We outline the setup process below.

We assume that both R (https://cran.r-project.org/) and RStudio (https://posit.co/downloads) are installed.

Create a GitHub account

First, we create a GitHub account.

  • Choose an email address to associate with your GitHub account.
    • Educational users with a .edu email addresses can upgrade for free to a GitHub Pro account.
  • Choose a strong password.
  • Select a username.
    • Think long term about this username. It may be an account that follows you for much of your professional life!
    • Good choices are related to your name.
      • E.g., if my name is John Q Smith, some good options might be jsmith, jqsmith, john-smith, john-q-smith, etc.
    • Bad choices are not professional or unique.
      • E.g., jsmith198451, i-love-cereal, PartyKing.
  • Continue the registration process.

Install git

Next, we install a version of git on our computer.

  • It might be wise to verify whether git is already installed on your computer.
  • Skip to the “Verifying communication between git and RStudio” section below to do this before following the instructions below.

Download and install the appropriate version of git from https://git-scm.com/downloads.

In general, it is best to install git using all the default settings.

Verifying communication between git and RStudio

We need to make sure that RStudio is aware of the location of the git program on our computer.

  • Navigate to Global Options (under Tools for Windows machines and currently under Edit → Preferences for a Mac).

  • Click Git/SVN.

  • Note whether RStudio has automatically discovered the file path of the git executable in the “Git executable” field.

  • If the Git executable field is empty or wrong, then you will need to figure out the issue. You can try:

    • Reinstalling git.
    • Use a different approach for installing git.
    • Manually locating the git installation on your compute and using that file path directory.
    • Use web searches to see other solutions that have worked for people.

To verify that RStudio has the correct path to git:

  • Open a new Terminal window by pressing Alt + Shift + R on a Windows computer or Option + Shift + R on a Mac.
  • Type git --version in the Terminal window and press enter.
  • If RStudio is looking in the correct location for git, then the current install version of git will be printed in the Terminal window.

Authorizing RStudio to access GitHub

Next, we make sure that RStudio can talk with GitHub.

We must configure our username and email address using git config in the Terminal window.

We need to run the following commands in the Terminal window (replacing myusername and myemail@email.edu with your specific username and email address):

git config --global user.name 'myusername'
git config --global user.email 'myemail@email.edu'

To link RStudio with GitHub, we need to set our GitHub credentials in RStudio.

Install the gitcreds package.

  • Run install.packages("gitcreds") in the R Console.
  • Run library(gitcreds) in the R Console to attach the gitcreds package.

We need to create a Personal Access Token to link RStudio with GitHub.

  • Click on your Profile photo.

  • Click Settings.

  • Click Developer settings.

  • Under Personal Access Tokens, click Tokens (classic).

  • Write a Note (perhaps which computer the access is being granted to).
  • Select when the token will expire (shorter is more secure).
  • Select the access we want to grant, if being cautious. Make sure to select repo access!
  • Click Generate token at the bottom of the page.

  • Copy the token.

  • Run gitcreds_set() in the R Console.
  • Paste the token you copied into the Console and hit Enter.

RStudio/git should be linked with GitHub!

Using version control for an R Studio Project

There are different ways to manage the version control workflow from within R.

We present the most basic workflow, which is to enable version control for a new project cloned from an existing GitHub repository.

We will do this by creating a new, empty repository, but you can do the same thing with an existing repository as long as you have the url for the repository you want to clone.

  • Create a new repository on GitHub by clicking New on on your GitHub homepage.

  • Name your new repository, make it public, click Create Repository. We will create a new repository named “mytest”.

  • Copy the url for the repository.

Next, create a New Project using version control.

  • Click File → New Project.

  • Click Version Control.

  • Click Git.

  • Paste the url in the “Repository URL” field, click Create Project.

If everything is connected correctly, the newly cloned project should include a Git tab in the Environment pane.

The basic workflow when using RStudio to do version control is:

  • Add/edit files in the current project.
  • Add the appropriate files to the staging area.
  • Commit the changes to the local git repository and provide a short description of the changes made.
  • Push the changes from the local repository to a remote repository (GitHub).

We provide a simple example.

Change some code

  • Open a new R Script (Ctrl + Shift + N on a Windows computer or Cmd + Shift + N on a Mac).
  • Add a single command to the first line of the file: print("Hello, world!").
  • Save the file as test.R.

Stage the changes

In the Git tab of the Environment pane:

  • Check the box next to “test.R”.
  • Click the Commit button.

Commit the changes to the local repository

  • Review changes.
  • Add a Commit message.
    • This is a brief description of the changes made in the commit.
  • Click Commit.

Push the changes to the remote repository

After you close the Commit window, click Push in the Git tab of the Environment pane to push the changes from the local repository to GitHub.

The changes should be pushed to your GitHub repository, which you can see if you go to that repository on your GitHub page.

  • We can see that test.R has been added to our “mytest” repository on GitHub.

Adding RStudio version control to a local RStudio project

Suppose we want to use version control for a local RStudio project that isn’t already linked to a git repository.

A local RStudio project can be created in two ways.

Creating a New Project with a New Directory

We first discuss creating a new RStudio project with a new directory.

  • Create a New Project by first clicking File → New Project.

  • Select New Project.

  • Select New Directory.

  • Name the new directory using the Directory name field (“test2” in our case), uncheck the “Create a git repository” box (we will do this manually), and click Create Project.

Creating a New Project from an existing directory

We now discuss creating a new RStudio project from an existing directory.

  • Create a New Project by first clicking File → New Project.

  • Select Existing Directory.

  • Browse to the existing folder we want to turn into a project and click Create Project.

Adding version control

To add GitHub version control to an existing RStudio project, we need a place to store the remote repository.

First, create a new repository on GitHub.com to store the project.

  • Usually, the repository name is the same as your project name, but it doesn’t have to be.
  • In our example, the project name and GitHub repository name are both “test2”.

Next, open a Terminal window in RStudio and run the following commands:

  • Run git init to initialize git for this project.
    • You only need to do this if no git repository is already initialized.
    • If you check “Create a git repository” when creating a New Project then you can skip this step.
  • Run git add --all to stage all project files.
  • Run git commit -m "initial commit" to make our first commit.
  • Run git branch -M main to declare the branch (“main” is the default).
  • Run git remote add origin https://github.com/username/reponame.git to link the remote repository to the local repository.
    • Replace username and reponame with the correct text.
  • Run git push -u origin main to push the commit from the local repository to GitHub.

In order the commands are:

git init
git add --all
git commit -m "initial commit"
git branch -M main
git remote add origin https://github.com/username/reponame.git
git push -u origin main

If you close and reopen the project in RStudio, then we should see the Git tab in the Environment pane.

Final thoughts

This is a small tutorial for using RStudio for version control with git and GitHub.

If you run into problems, https://happygitwithr.com/ is a a more in-depth resource for solving your problems.